AITopics | target level

Collaborating Authors

target level

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The QCET Taxonomy of Standard Quality Criterion Names and Definitions for the Evaluation of NLP Systems

Belz, Anya, Mille, Simon, Thomson, Craig

arXiv.org Artificial IntelligenceSep-29-2025

Prior work has shown that two NLP evaluation experiments that report results for the same quality criterion name (e.g. Fluency) do not necessarily evaluate the same aspect of quality, and the comparability implied by the name can be misleading. Not knowing when two evaluations are comparable in this sense means we currently lack the ability to draw reliable conclusions about system quality on the basis of multiple, independently conducted evaluations. This in turn hampers the ability of the field to progress scientifically as a whole, a pervasive issue in NLP since its beginning (Sparck Jones, 1981). It is hard to see how the issue of unclear comparability can be fully addressed other than by the creation of a standard set of quality criterion names and definitions that the several hundred quality criterion names actually in use in the field can be mapped to, and grounded in. Taking a strictly descriptive approach, the QCET Quality Criteria for Evaluation Taxonomy derives a standard set of quality criterion names and definitions from three surveys of evaluations reported in NLP, and structures them into a hierarchy where each parent node captures common aspects of its child nodes. We present QCET and the resources it consists of, and discuss its three main uses in (i) establishing comparability of existing evaluations, (ii) guiding the design of new evaluations, and (iii) assessing regulatory compliance.

computational linguistic, large language model, machine learning, (23 more...)

arXiv.org Artificial Intelligence

2509.22064

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
(22 more...)

Genre:

Research Report (1.00)
Overview (0.67)

Industry:

Law (1.00)
Government (1.00)
Leisure & Entertainment (0.92)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(7 more...)

Add feedback

Automatically Suggesting Diverse Example Sentences for L2 Japanese Learners Using Pre-Trained Language Models

Benedetti, Enrico, Aizawa, Akiko, Boudin, Florian

arXiv.org Artificial IntelligenceJun-5-2025

Providing example sentences that are diverse and aligned with learners' proficiency levels is essential for fostering effective language acquisition. This study examines the use of Pre-trained Language Models (PLMs) to produce example sentences targeting L2 Japanese learners. We utilize PLMs in two ways: as quality scoring components in a retrieval system that draws from a newly curated corpus of Japanese sentences, and as direct sentence generators using zero-shot learning. We evaluate the quality of sentences by considering multiple aspects such as difficulty, diversity, and naturalness, with a panel of raters consisting of learners of Japanese, native speakers -- and GPT-4. Our findings suggest that there is inherent disagreement among participants on the ratings of sentence qualities, except for difficulty. Despite that, the retrieval approach was preferred by all evaluators, especially for beginner and advanced target proficiency, while the generative approaches received lower scores on average. Even so, our experiments highlight the potential for using PLMs to enhance the adaptability of sentence suggestion systems and therefore improve the language learning journey.

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2506.0358

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.04)
Europe > France > Pays de la Loire > Loire-Atlantique > Nantes (0.04)
(16 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.88)

Industry: Education > Curriculum > Subject-Specific Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Add feedback

From Tarzan to Tolkien: Controlling the Language Proficiency Level of LLMs for Content Generation

Malik, Ali, Mayhew, Stephen, Piech, Chris, Bicknell, Klinton

arXiv.org Artificial IntelligenceJun-5-2024

We study the problem of controlling the difficulty level of text generated by Large Language Models (LLMs) for contexts where end-users are not fully proficient, such as language learners. Using a novel framework, we evaluate the effectiveness of several key approaches for this task, including few-shot prompting, supervised finetuning, and reinforcement learning (RL), utilising both GPT-4 and open source alternatives like LLama2-7B and Mistral-7B. Our findings reveal a large performance gap between GPT-4 and the open source models when using prompt-based strategies. However, we show how to bridge this gap with a careful combination of finetuning and RL alignment.

controlerror, dataset, proficiency level, (13 more...)

arXiv.org Artificial Intelligence

2406.0303

Country:

Asia > Singapore (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
(17 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine (1.00)
Education > Curriculum > Subject-Specific Education (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Conformal Alignment: Knowing When to Trust Foundation Models with Guarantees

Gui, Yu, Jin, Ying, Ren, Zhimei

arXiv.org Machine LearningMay-21-2024

Before deploying outputs from foundation models in high-stakes tasks, it is imperative to ensure that they align with human values. For instance, in radiology report generation, reports generated by a vision-language model must align with human evaluations before their use in medical decision-making. This paper presents Conformal Alignment, a general framework for identifying units whose outputs meet a user-specified alignment criterion. It is guaranteed that on average, a prescribed fraction of selected units indeed meet the alignment criterion, regardless of the foundation model or the data distribution. Given any pre-trained model and new units with model-generated outputs, Conformal Alignment leverages a set of reference data with ground-truth alignment status to train an alignment predictor. It then selects new units whose predicted alignment scores surpass a data-dependent threshold, certifying their corresponding outputs as trustworthy. Through applications to question answering and radiology report generation, we demonstrate that our method is able to accurately identify units with trustworthy outputs via lightweight training over a moderate amount of reference data. En route, we investigate the informativeness of various features in alignment prediction and combine them with standard models to construct the alignment predictor.

llama-2-13b-chat, power fdr and power fdr, target level, (9 more...)

arXiv.org Machine Learning

2405.10301

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.70)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Nuclear Medicine (0.86)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Conformal Inference for Online Prediction with Arbitrary Distribution Shifts

Gibbs, Isaac, Candès, Emmanuel

arXiv.org Artificial IntelligenceOct-5-2023

We consider the problem of forming prediction sets in an online setting where the distribution generating the data is allowed to vary over time. Previous approaches to this problem suffer from over-weighting historical data and thus may fail to quickly react to the underlying dynamics. Here we correct this issue and develop a novel procedure with provably small regret over all local time intervals of a given width. We achieve this by modifying the adaptive conformal inference (ACI) algorithm of Gibbs and Cand\`{e}s (2021) to contain an additional step in which the step-size parameter of ACI's gradient descent update is tuned over time. Crucially, this means that unlike ACI, which requires knowledge of the rate of change of the data-generating mechanism, our new procedure is adaptive to both the size and type of the distribution shift. Our methods are highly flexible and can be used in combination with any baseline predictive algorithm that produces point estimates or estimated quantiles of the target without the need for distributional assumptions. We test our techniques on two real-world datasets aimed at predicting stock market volatility and COVID-19 case counts and find that they are robust and adaptive to real-world distribution shifts.

distribution shift, dtaci, prediction, (14 more...)

arXiv.org Artificial Intelligence

2208.08401

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Texas > Dallas County > Dallas (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.34)

Industry: Banking & Finance > Trading (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Benchmarking Bayesian neural networks and evaluation metrics for regression tasks

Staber, Brian, Da Veiga, Sébastien

arXiv.org Artificial IntelligenceFeb-14-2023

Due to the growing adoption of deep neural networks in many fields of science and engineering, modeling and estimating their uncertainties has become of primary importance. Despite the growing literature about uncertainty quantification in deep learning, the quality of the uncertainty estimates remains an open question. In this work, we assess for the first time the performance of several approximation methods for Bayesian neural networks on regression tasks by evaluating the quality of the confidence regions with several coverage metrics. The selected algorithms are also compared in terms of predictivity, kernelized Stein discrepancy and maximum mean discrepancy with respect to a reference posterior in both weight and function space. Our findings show that (i) some algorithms have excellent predictive performance but tend to largely over or underestimate uncertainties (ii) it is possible to achieve good accuracy and a given target coverage with finely tuned hyperparameters and (iii) the promising kernel Stein discrepancy cannot be exclusively relied on to assess the posterior approximation. As a by-product of this benchmark, we also compute and visualize the similarity of all algorithms and corresponding hyperparameters: interestingly we identify a few clusters of algorithms with similar behavior in weight space, giving new insights on how they explore the posterior distribution.

artificial intelligence, coverage probability, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2206.06779

Country:

Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
Europe > France > Brittany > Ille-et-Vilaine > Rennes (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Terminating-Knockoff Filter: Fast High-Dimensional Variable Selection with False Discovery Rate Control

Machkour, Jasin, Muma, Michael, Palomar, Daniel P.

arXiv.org Machine LearningOct-12-2021

We propose the Terminating-Knockoff (T-Knock) filter, a fast variable selection method for high-dimensional data. The T-Knock filter controls a user-defined target false discovery rate (FDR) while maximizing the number of selected true positives. This is achieved by fusing the solutions of multiple early terminated random experiments. The experiments are conducted on a combination of the original data and multiple sets of randomly generated knockoff variables. A finite sample proof based on martingale theory for the FDR control property is provided. Numerical simulations show that the FDR is controlled at the target level while allowing for a high power. We prove under mild conditions that the knockoffs can be sampled from any univariate distribution. The computational complexity of the proposed method is derived and it is demonstrated via numerical simulations that the sequential computation time is multiple orders of magnitude lower than that of the strongest benchmark methods in sparse high-dimensional settings. The T-Knock filter outperforms state-of-the-art methods for FDR control on a simulated genome-wide association study (GWAS), while its computation time is more than two orders of magnitude lower than that of the strongest benchmark methods.

artificial intelligence, machine learning, t-knock filter, (18 more...)

arXiv.org Machine Learning

2110.06048

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.05)
Asia > China > Hong Kong (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > Experimental Study (0.88)
Research Report > New Finding (0.67)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback

Logistic Regression: A Concise Technical Overview

#artificialintelligenceJan-26-2019, 01:34:50 GMT

We have all heard of Linear Regression. It's what we all learn in our first semester of statistics. It is our default technique when we have a continuous outcome variable. A refresher on Linear Regression can be found here. But other times we have Categorical Outcomes.

linear regression, logistic regression, regression, (15 more...)

#artificialintelligence

Genre:

Research Report > New Finding (0.90)
Research Report > Experimental Study (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Hierarchical Video Understanding

Mahdisoltani, Farzaneh, Memisevic, Roland, Fleet, David

arXiv.org Machine LearningSep-3-2018

We introduce a hierarchical architecture for video understanding that exploits the structure of real world actions by capturing targets at different levels of granularity. We design the model such that it first learns simpler coarse-grained tasks, and then moves on to learn more fine-grained targets. The model is trained with a joint loss on different granularity levels. We demonstrate empirical results on the recent release of Something-Something dataset, which provides a hierarchy of targets, namely coarse-grained action groups, fine-grained action categories, and captions. Experiments suggest that models that exploit targets at different levels of granularity achieve better performance on all levels.

artificial intelligence, different level, machine learning, (17 more...)

arXiv.org Machine Learning

1809.03316

Country: North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback